A Perfectly Invertible Rank Order Coder 



Abstract. Our goal is to revisit rank order coding by proposing an orig- 
inal exact decoding procedure for it. Rank order coding was proposed 
by Simon Thorpe et al. to explain the impressive performance of our 
visual system in recognizing objects. It is based on the hypothesis that 
the retina represents the visual stimulus by the order in which its cells 
are activated. A rank order coder/decoder was then designed involving 
three stages [H]: (i) A model of the stimulus transform in the retina 
consisting in a redundant filter bank analysis; (ii) A sorting stage of the 
filters according to their activation degree; (iii) A straightforward decod- 
ing procedure consisting in a weighted sum of the most activated filters. 
Focusing on this last stage, it appears that the decoding procedure em- 
ployed yields reconstruction errors that limit the model Rate/Quality 
performances when used as an image codec. The attempts made in the 
literature to overcome this issue are time consuming and alter the cod- 
ing procedure, or are lacking mathematical support and are infeasible for 
standard size images. Here we solve this problem in an original fashion 
by using the frames theory, where a frame of a vector space designates 
an extension for the notion of basis. First, we prove that the analyzing 
filter bank considered is a frame, and then we define the corresponding 
dual frame that is necessary for the exact image reconstruction. Second, 
to deal with the problem of memory overhead, we design a recursive 
Out-Of-Core blockwise algorithm for the computation of this dual frame. 
Our work provides a mathematical formalism for the retinal model under 
study and specifies a simple and exact reverse transform for it. Further- 
more, the framework presented here can be extended to several models 
of the visual cortical areas using redundant representations. 



1 Introduction 

Neurophysiologists made substantial progress in better understanding the early 
processing of the visual stimuli. Especially, several efforts proved the ability of 
the retina to code and transmit a huge amount of data under strong time and 
bandwidth constraints |11I8| . The retina does this by means of a neural code 
consisting in a set of electrical impulses: the spikes. Based on these results, 
our aim is to use the computational neuroscience models that mimic the retina 
behavior to design novel lossy coders for static images. 

Although there is no clear evidence on how the spikes encode for the visual 
stimuli, we assume in this paper that the relevant encoding feature is the order 
in which the retina cells emit their first spike given the stimulus onset. This 
choice was motivated by Thorpe et al. neurophysiologic results on ultra-rapid 
stimulus categorization |ll|12j . Authors showed that still image classification 
can be achieved by the visual cortex within very short latencies of about 150 ms 



or even faster. As an explanation, it was stated that: There is information in 
the order in which the cells fire^ and thus the temporal ordering can be used as 
a code. This code termed as rank order coding (ROC) was further studied and 
yielded the conception of a classical retina model [H] . 

However, one major limitation of the ROC coder defined in [H] prevents us 
from its use for the design of image codecs. It is the inaccuracy of the proposed 
decoding procedure. Indeed, the bio-inspired retina model that generates the 
spikes is based on a redundant filter bank image analysis, where the considered 
filters are not strictly orthogonal. Thus, the filter overlap yields reconstruction 
errors that limit the Rate/Quality performance |9|2|7| . Efforts to correct this 
issue followed two main approaches. A first one consisted in inverting directly 
the transform operator to obtain a reverse filter bank as in [112]. The method 
presented is based on a pseudo-inversion. Though interesting, we will prove that 
this method lacks mathematical support. Besides the procedure used deals with 
a high dimension matrix and thus is infeasible, as such, for standard size images. 
A second approach relies on Matching Pursuit (MP) algorithms as in [9JJ. These 
methods are time consuming and alter the coding procedure. In addition, the 
MP approach depends on the order in which the "match and update" mechanism 
is performed, and this makes the coding procedure depend on the stimulus itself. 

In this paper, we give an original solution relying on the mathematical con- 
cept of frames and dual frames, which is an extension of the notion of basis [6]. 
First, we express the model in a matrix-based fashion. Then we prove that the 
analyzing filter bank as we define it is a frame. Finally, we construct the corre- 
sponding dual frame that is necessary for the exact image reconstruction. Besides 
we design an out-of-core algorithm for the computation of the dual frame which 
resolves the encountered issues of memory overhead. Our method requires the 
computation of a reverse operator once for all and keeps the bio-inspired coding 
scheme unchanged. Thanks to this approach, we show that the reconstructed 
image that we get is equal to the original stimulus. 

This paper is organized as follows: In Section O we present the three stages 
of the rank order coding/decoding method. Then in Section [3] we define an 
exact decoding scheme through the construction of a dual frame. Finally in 
Section m we summarize the results and show the gain that we obtain in terms 
of Rate/Quality tradeoff. 

2 The rank order coder/decoder: Three stages 

2.1 The image transform: A bio-inspired retina model 

Neurophysiologic experiments have shown that, as for classical image coders, the 
retina encodes the stimulus representation in a transform domain. The retinal 
stimulus transform is performed in the cells of the outer layers. Quantitative 
studies have proven that the outer cells processing can be approximated by a 
hnear filtering. In particular, the authors in proposed the largely adopted 
DoG filter which is a weighted difference of spatial Gaussians that is defined as 
DoG{x^y) = w^Gcjc{x^y) — w^Gcjs{x^y)^ where and are the respective 



positive weights of the center and surround components of the receptive fields, 
and are the standard deviations of the Gaussian kernels and G^^^ 
such that < cr^ . The DoG cells can be arranged in a dyadic grid F of K layers 



to sweep all the stimulus spectrum as shown in Figure 1(b) [ 14...9.7j . Each layer 
^ k < K in the grid, is tiled with filtering cells having a scale k and generating 
a transform subband 5/^, that we denote by DoGk- 

DoGk{x, y) = w'^Gai ^) " ^'^^i (1) 

where (t^-^i = ^cr^ and cr^_^-^ = ^cr^. Each DoGk filter has a size of 
{2Mk + 1) X {2Mk + 1), with Mk = 3cr^. Authors in ^ chose the biologically 
plausible parameters w'^ = = 1, = V/c, and cr|^_i = 0.5 pixel. 

In order to measure the degree of activation Ckij of a given retina cell, such 
that {k^i^j) G we compute the convolution of the original image / by the 
DoGk filter. Yet each layer k in the dyadic grid F is undersampled with a step 
of 2^~^~^ pixels with an original offset of [2^~^~^J pixels, where [.J is the floor 
operator. Having this, we define the function Uk^ such that the Ckij coefficients 
are computed at the locations (uk{i)^Uk{j)) as follows: 

Uk^) = L2^-^-'j +2^-^-ii,Vfe G I0,i^- 11. (2) 

Uk can be seen as an undersampling function. We notice that UK-i{i) = ^, and 
that the other functions {uk)kelo,K-2j undersampled versions of uk-i- Ckij 
is then computed as follows: 

^kij= ^ ^ DoGk{uk(i) - x,Uk{j) -y) f{x,y). (3) 

The transform specified generates a vector c of approximatively ^N'^ Ckij coef- 
ficients for an A^^-sized image, as the architecture of this transform is similar to 
a Laplacian pyramid [3^ . An example of such a transform is shown in Figure [TJ 



2.2 Sorting: The generation of the rank order code 

Thorpe et al |ll|12j proposed that the order in which spikes are emitted encodes 
for the stimulus. This yielded the ROC which relies on the following simplifying 
assumptions: (i) From stimulus onset, only the first spike emitted is considered; 
(ii) The time to fire of each cell is proportional to its degree of excitation; (iii) 
Only the order of firing of the neurons encodes for the stimulus. 
Such a code gives a biologically plausible interpretation to the rapidity of the 
visual stimuli processing in the human visual system. Indeed, it seems that most 
of the processing is based on feed-forward mechanism before any feedback oc- 
curs [12]. So, the neurons responses {ckij)kijer defined in Equation ([3]) are sorted 
in the decreasing order of their amplitude \ckij\. 

The final output of this stage, the ROC, is then a sorted list of Ns couples 
ip^Cp) such that \cp\ ^ with p being the index of the cell defined by: 

p{k, i, j) = k A"! + i Nk + j and N'^ the number of cells in the subband Bk- Here, 
the generated series {PiCp)^^ is the only data transmitted to the decoder. 
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Fig. 1. Illustration of the retinal image transform applied on cameraman 1(a) 
Image size is 257x257 pixels. 



|l(b) 



the image analysis (from [H]). 1(c) 



Example of a dyadic grid of DoG's used for 
The transform result showing the subbands. 



2.3 Decoding procedure of the rank order code 

We consider the set of the first Ns highest cell responses forming the ROC of a 
given image /. In the authors defined /at^, the estimation of / by: 

fNs{xiy)= CpDoGk{uk{i) -x,Uk{j) -y). (4) 

Equation (|4]) gives a progressive reconstruction depending on Ns. Indeed, one 
can restrict the code to the most valuable coefficients Cp, i.e the most activated 
cells of the retina. This feature makes the coder be scalable [7]. 

An example of such a reconstruction is given in Figure [21 with all the retina 
cells taken into account. Figure [2] also shows that the retina model decoding pro- 
cedure, though giving a good approximation of the stimulus, is still inaccurate. 
In this example, reconstruction quality is evaluated to 27 dB of PSNR. 

3 Inverting the bio-inspired retina model 
3.1 Introduction of a low-pass scaling function 

We introduce a low-pass scaling function in the filter bank used for image anal- 
ysis. This modification do not alter the ROC coder architecture and has both a 
mathematical and a biological justification. 

Indeed, the Fourier transform of a Gaussian is another Gaussian, so that 
^{DoGk) is a difference of Gaussians. We can easily prove, for w'^ = = 1 
(cf. ([1])), that the central Fourier coefficient ^{DoGk){uo{0),uo{0)) = 0, and 
that ^{DoGk){i^ j) > 0, 7^ (2io(0), ^io(O)) . This assertion can be verified 

on Figure [3(b)| So, we replaced the DoGq filter, with no change in the notation, 
by a Gaussian low-pass scaling function consisting in its central component, 
with: 

DoGo{x,y) =w''Ga^^{x,y). (5) 
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Fig. 2. Result of the decoding procedure with the original approach using all 
of the retina cells responses. 2(a) Reconstructed image. The PSNR quahty 



measure of /at, yields 27 dB. |2(b)[ Error image: high frequencies are the ones that 



are the most affected by this approach, 
scale. 



2(c) Same as (b) but in a logarithmic 



We do this to cover up the centre of the spectrum which otherwise won't be 
mapped. Figures 3(a)| and |3(b)| show the spectrum partitioning with the DoG 



filters and the plug added by the scaling function DoGq (in black thick line). 
With no scaling function, all constant images would be mapped into the null 
image and this would make the transform be non-invertible. Here we overcome 
this problem as the central Fourier coefficient ^{DoGq){u{){^)^uq{())) > 0. 

The scaling function introduction is further justified by the actual retina 
behavior. Indeed, the surround Ga^ in ([T]) appears progressively across time 
driving the filter passband from low frequencies to higher ones. So that, the 
Gaussian scaling function represents the very early state of the retina cells. 

Once we introduced the DoGq scaling function, and in order to define an 
inverse for the new transform we must demonstrate that it is a "frame". This 
will be detailed in Sections 13.21 and 13.31 

3.2 Analysis and synthesis of an image using matrices 

Unlike the implementations in |14|9|7j , we use a matrix <P to compute the image 
transform through the modelled retina. The lines of <P are the different analyzing 
DoGk filters. This yields an "undersampled Toeplitz-block" sparse matrix (see 
Figure [S^cJ ). Such an implementation allow fast computation of the multi-scale 



retinal transform through sparse matrix specific algorithms. This will in addition 
help us construct the dual frame of ^. The DoG transform is outlined in the 
following equation: 

c = ^f. (6) 

Interestingly, the straightforward synthesis as defined in (j4]) amounts to the 
multiplication of the vector output c by ^* the Hermitian transpose of ^. The 
reconstruction procedure is then outlined in the following equation: 



In, = ^* c. 



(7) 
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Fig. 3. |3(a) Spectrum of the DoG filters. The abscissa represents the frequen- 
cies. The ordinate axis represents the different DoGk filters gain in dB.[3(b)| Half 
of the spectrum in 3(a)] with the abscissa having a logarithmic step. The scaling 
function DoGq is plotted in black thick line. |3(c) In this paper, the transform 
^ is represented as a matrix where blue dots correspond to a non-zero element. 
Note here that ^ is a highly sparse matrix. 



3.3 The DoG transform is a frame operator 

Our aim is to prove that the bio-inspired retina image transform presented 
amounts to a projection of the input image / onto a frame of a vector space. 
The frame is a generalization of the idea of a basis to sets which may be linearly 
dependent [6]. These frames allow a redundant signal representation which, for 
instance, can be employed for coding with error resilience. By proving the frame 
nature of this transform, we will be able to achieve an exact reverse transform 
through the construction of a dual frame. According to [6], to prove that our 
transform is a frame one needs to show that 30<a^/3<oo, such that: 

«ll/f < E (cfc..)'</9|l/f, V/. (8) 

kijer 

For example, authors in p!0] proved that the Laplacian pyramid [31 is a frame and 
estimated experimentally a and /3. The authors in [4 also designed an invertible 
Laplacian pyramid based on the same principles. Though there are some simi- 
larities, the DoG transform do not use the same filters and do not work in the 
same fashion. Thus, in the following, we give an original demonstration in our 
specific case which proves that the DoG transform is a frame, this furthermore 
with a mathematical formalism. 

Let us prove these two inequalities of the so-called "frame condition" . 

Proof. Upper bounding: We have: 



kijer 



(9) 



where Bk is the subband of scale k generated by the image transform with: 
Bk{iJ)= ^ DoGk{uk{i) - x,Uk{j) -y) f{x,y). 

x^Uk{i)-Mk y=Uk{j)-Mk 

If we denote by Uk the undersamphng operator corresponding to the function 
Uk (cf. Equation (|2])), we can write the fohowing: 

Bk = Uk{DoGk^f). (10) 
Then, we have the fohowing obvious inequahties: 

IIBfcll = \\Uk{DoGk*f)\\ < \\Uk {\DoGk\ * |/|)|| < \\\DoGk\ * \f\\\ ^ \\DoGk\\ ||/||. 
So, with we infer the fohowing bounding: 

K-l /K-i \ 

E (^fc*^-)' = E ii^'^ii' ^ E ii^^^fcii' 11/11' = ^11/11'' (11) 

kijer k=0 \k=0 / 

which shows the first inequahty. 

Lower bounding: We start from the fact that: 

Y^\\B,f;,\\BK-if + \\Bof, (12) 

k=0 

which amounts to write the fohowing inequahties: 

E {Ck^jf>\\DoGK-l*ff+\\{DoGo*f){uo{0),Uo{0))\f 

kijer 

= \\^{DoGK-i)^{f )\? + II {^{DoGo) nf)) K(0), uo{0)) f 

N-1 

= ^ {^{DoGK-i){i.j)^{m.j))' ^\\^{DoGo){uo{0).uo{0^^ 

Weknowthdit DoGK-i{i J) > 0, + (^/o(0), ^io(O)) and that I:>oGk-i (^o(O), ^io(O)) = 

0. We also have ^(L>oGo) (iio(O), iio(O)) > (cf. Section [3l]) . So, if we define a 
real a by: 

a = mm{{^(I^oGo)'(iio(0),iio(0))}u{^(I^oGK-i)'(^,j), (^,j) G [0, TV - if \(2io(0), iio(O)) } }, 
then we have that a > 0, and we get the following inequalities: 

AT-l 

E (=F(Z)oGK-i)(i,j)^(/)(i,j))' + ll^(i>oGo)(«o(0),«o(0))^(/)(tio(0),«o(0))f 

i,j=0 

= E (^PoGK-i)(i,jO^(/)(i,j))' + ||^poGo)(^.o(0),^.o(0))^(/)(«o(0),^.o(0))f 

i,je[0,N-lf \(«o(0),«o(0)) 

E('^(/)(^'i))'="ll/f> 
i,ielo,iv-if 



so that, ^j^ij^p i^kij)'^ ^ which concludes the proof. 

3.4 Synthesis using the dual DoG frame 

We introduce in this section a correction means for the reconstruction error in 
the retina model presented through the frame theory. 

The straightforward analysis/synthesis procedure can be outlined in the re- 
lation between the input image and the reconstruction estimate: /at^ = (^*(^/. 
As we already demonstrated that the DoG transform is a frame, is said 
to be the frame operator. To have an exact reconstruction of /, one must con- 
struct the dual DoG vectors. A preliminary step is to compute {^*^)~^^ the 
inverse frame operator. We then get a corrected reconstruction , defined by: 
f^^ = {^*^)~^ fN^' If Ns is the total number of the retina model cells, we have: 

In. = (^*^)"7iv. = ((^*^)-^^*) c = ^ / = /. (13) 

As made clear through Equation (p!3|) . the dual vectors are the lines of 
If we reconstruct f^^ starting from the ROC output c and using the dual frame 
vectors, we get the results shown in Figure SI The reconstruction obtained is 
accurate and requires only a simple matrix multiplication. In this example, re- 
construction quality is evaluated to 296 dB of PSNR. 

is a square, definite positive invertible matrix [6j. Another advantage 
of our method is that {<P*<P)~^ is well conditioned, with a conditioning number 
estimated to around 16, so that its inversion is stable. This is a crucial issue as 
previous works aimed at conceiving the DoG reverse transform tried to invert 
the original filter bank with no scaling function DoGq |l|2j . This is obviously 
mathematically incorrect as the filter bank thus defined is not a frame and 
thus its pseudo inverse ((^*(^)~^^* does not exist. The solution proposed by the 
authors of |l|2j gives only a least squares solution to an ill-conditioned problem. 
Our method instead is stable. Besides through the out-of-core algorithm that we 
designed we can invert even for large images whereas authors in |l|2j are 

restricted to a maximum size of 32 x 32. 

Furthermore, correcting the reconstruction errors using the adequate dual 
frame does not alter the coding procedure. Indeed, methods introduced in |9|2| 
require a Matching Pursuit (MP) procedure. MP is time consuming and depends 
on the order in which the "match and update" mechanism is performed. Our 
method keeps the coding procedure straightforward, fast and order-independent. 

The recursive out-of-core blockwise inversion algorithm: Though the mathemat- 
ical fundamentals underlying this work are simple, the implementation of such 
a process is a hard problem. Indeed, in spite of the sparsity of ^ and ^?*, the 
frame operator <P'^<P is an A^^-sized dense matrix for an A/'^-sized image /. For 
instance, if A' = 257, holds in 16 Gbytes, and 258 Gbytes if A" = 513. The 
solution is to recourse to the so-called out-of-core (OOC) algorithms [13] . 



Fig. 4. Result of the decod- 
ing procedure with the dual 
DoG frame using all of the 
retina cells responses. ID^a): 
Reconstructed image. The 
PSNR quality measure of 
/^^ yields 296 dB. g^b): 
Error image in logarith- 
mic scale. This shows that 
the reconstruction using the 
dual frame is very precise. 



The frame operator is constructed block by block, and each block is 
stored separately on disk. The inversion is then performed using a recursive 
algorithm that relies on the block- wise matrix inversion formula that follows: 

f A _ f A-^ ^A-^ BQ-^CA-^ -A'^BQ'^ 

\CDJ -\ -Q-'CA-' Q-i 

where Q is the Schur complement of A, such that: Q = D — C A~^ B. Thus, 
inverting a matrix amounts to the inversion of two matrices that are 4 times 
smaller. The inversion consists then in subdividing the problem by a factor 4 
at each recursion level until we reach a single block problem. Obviously, this al- 
gorithm requires OOC blockwise matrix routines for multiplication, subtraction 
and addition, that we implemented in parallel to accelerate the computation. 

4 Comparison to the original rank order codec 

We experiment our new decoder in the context of scalable image decoding. We 
reconstruct the cameraman test image using an increasing number Ng of sig- 
nificant coefficients (cf. Equation (j4|)). We compare the results when using the 
original DoG filters in ^ and the dual DoG filters (^*^)-^^* for the decoding 
procedure. Figure [5] summarizes the results obtained, with the upper line showing 
the progressive straightforward reconstruction /tv^ and the bottom line showing 
the corrected progressive reconstruction /^^. The gain in PSNR is significant 
for low rates (around 0.3 dB) and increases with Ng up to 270 dB. Besides, our 
method does not alter the coding procedure and keeps it straightforward, fast 
and order-independent. 
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